Transcription activator-like effector assembly

ABSTRACT

Described herein are techniques for assembling a polynucleotide encoding a transcription activator-like effector nucleases (TALEN). The techniques ligate and digest necessary modules for a TALEN assembly in one reactor or system. Methods and Kits for generating a TALEN are also described.

CROSS REFERENCE TO RELATED PATENT APPLICATIONS

This application claims priority to Chinese Patent Application No.201210336604.4, filed on Sep. 12, 2012, entitled “A DNA library and amethod for transcription activator-like effector nuclease plasmidassembly,” which is hereby incorporated by reference in its entirety.

SEQUENCE LISTING

The Sequence Listing associated with this application is provided intext format in lieu of a paper copy, and is hereby incorporated byreference into the specification. The name of the text file containingthe Sequence Listing is Sequence_listing_(—)5132-0002US.txt. The textfile is about 27 KB, was created on Aug. 5, 2013, and is being submittedelectronically via EFS-Web.

TECHNICAL FIELD

This disclosure relates to genome engineering. More specifically, thedisclosure relates to designed transcription activator-like effectorassemblies.

BACKGROUND

Target genome engineering is desirable for many scientists. By deletingor inserting a designed and specific nucleotide sequence in anendogenous genome, scientists can generate various animal models forperforming fundamental biological research and studying mechanisms ofdisease. In addition, scientists can create transgenic animals toproduce biological compositions and/or components, which may bedifficult to obtain from other resources. However, it is challenging toperform targeted and specific genome modifications using traditionaltechniques. The traditional techniques rely on random fragment exchangesof homologous chromosomes in natural cellular processes. Therefore, theefficiency for the traditional techniques is low (e.g., 10⁻⁶-10⁻⁸ as asuccessfully rate). Because of this low efficiency, these techniques aregenerally applied in mice rather than other animal models (e.g., largemammalians).

In 2009, two research groups identified a transcription activator-likeeffector (TALE) in plant pathogen Xanthomonas, which modulates host genefunctions by binding specific sequences within gene promoters. The TALErelated techniques helped scientists develop an easier method fortargeted genome engineering. This technique fuses TALE to Fokl togenerate a transcription activator-like effector nuclease (TALEN). Ingeneral, TALEs include tandem-like and nearly identical monomers (i.e.,repeat domains), flanked by N-terminal and C-terminal sequences. Eachmonomer contains 34 amino acids, and the sequence of each monomer ishighly conserved. Only two amino acids per repeat (i.e., residues12^(th) and 13^(th)) are hypervariable, and are also known as repeatvariable di-residues (RVDs). The RVDs determines the nucleotide-bindingspecificity of each TALE repeat domain.

TALE related techniques have increased the efficiency and usages ofgenome engineering, and make the genome engineering more convenient.However, assembling ten to twenty highly conserved DNA modules into avector is a big challenge.

BRIEF DESCRIPTION OF THE DRAWINGS

The detailed description is described with reference to the accompanyingfigures.

FIG. 1A is a diagram showing an exemplary DNA library including dimerrepeat modules.

FIG. 1B is a diagram showing an exemplary DNA library including monomerrepeat modules.

FIG. 2 is a diagram showing an exemplary TALEN backbone plasmid.

FIG. 3 is a diagram showing another exemplary TALEN backbone plasmid.

FIG. 4 is an exemplary process showing 19 modules ligation.

FIG. 5 is an exemplary process showing a TALE assembly.

FIG. 6 is a diagram showing exemplary monomers and dimers for TALEassemblies.

FIG. 7 is a photograph of an agarose gel electrophoresis showingconfirmation of assembly clones by restriction digestion analysis.

DETAILED DESCRIPTION

Various methods have been developed for assembling TALENs, such aschemical synthesis, two-step molecular cloning, and one-step molecularcloning. However, any of these methods has its own drawback. Forexample, although highly-repeated DNA sequences may be chemicallysynthesized, the cost is high and the outcome is hardly predictable.Also, two-step molecular cloning is also expensive, considering the costof materials and sequencing, as well as time consuming. As for one-stepmolecular cloning, under current techniques, the maximum number of DNAmodules encoding a TALEN is 14 using dimer modules. However, althoughnatural TALEs may include 12-23 repeat modules, designed TALEs aregenerally more than 14 repeat modules. Therefore, to generate a TALENincluding more than 14 repeat modules, current techniques requiremultiple steps for enzyme digestion, purifications, and ligation. Thisnot only limits the scope of use of TALENs related genome engineering,but also affects TALENs specificity. In addition, it is a challenge toproperly store intermediate products (e.g., digested DNA segments and atail of single strand). In sum, assembling a polynucleotide encoding aTALE including more than 14 repeat domains in a single cloning reactionhas not been accomplished.

Methods involving conventional molecular biology techniques aredescribed herein. Such techniques are general known in the art unlessotherwise specified in this disclosure. These techniques include PCRamplification and detection, cell transfection, cell culture, anddetection techniques.

Embodiments of this disclosure relate to a transcription activator-likeeffector nuclease (TALEN) assembly library and/or kit, which can be usedfor ligation of multiple repeat DNA modules encoding TALENs. In certainembodiments, the number of the multiple repeat modules is greater than14.

In certain related embodiments, the TAL assembly library may include 16sets, and each set includes n dimers, wherein n is an integer. In someembodiments, the TAL assembly library may include 4 sets, and each setincludes m monomers, wherein m is an integer and is not greater than n.As defined herein, a DNA module for TALE assembles may encode a singlenucleotide recognition domain, and is therefore referred as a monomerDNA module (i.e., monomer). The single nucleotide recognition domainincludes two amino acids recognizing one of A, T, C, and G. In addition,a DNA module for TALE assembles may encode a double nucleotiderecognition domain, and therefore is referred as a dimer DNA module(i.e., dimer), which includes amino acids that recognize one of AA, AT,AC, AG, TT, TA, TC, TG, CC, CA, CT, CG, GG, GA, GT, and GC. In someembodiments, a set of monomers or dimers may recognize the same singlenucleotide and the same pair of nucleotides respectively.

In some embodiments, each dimer or monomer may contain a 1^(st) overhangand a 2^(nd) overhang that are generated from digestion of type IIrestriction endonucleases, such as Bsal, BsmB1, BsmA1, and Bbsl. In someinstances, the digestion and later ligation are performed using onlyBsal. In certain embodiments, a sequence of the 2^(nd) overhang of adimer (e.g., dimer i) may be complementary to a sequence of the 1^(st)overhang of a dimer that is located after and adjacent to the dimer i,wherein i is an integer greater than 1 but less than n. In certainembodiments, a sequence 2^(nd) overhang of a monomer (e.g., monomer j)may be complementary to a sequence 1^(st) overhang of a monomer that islocated after and adjacent to the monomer j, wherein j is an integergreater than 1 but less than m.

For example, as illustrated in FIG. 5, there are four nucleotides at theoverhang, from 5′ to 3′. In this overhang, the 2^(nd) to 4^(th)nucleotides are a codon for Leu, and the first nucleotide in sensestrand is the last nucleotide for a codon for Gly. In the antisensestrand, the last two nucleotides are the first two nucleotides of acodon for Gly, while the first three nucleotides are complementary to acodon for Leu.

In some embodiments, dimers may be numbered as 1, . . . l, . . . and n,and monomers may be numbered as 1, j, . . . and m. For example, when nis not less than 7, more than 14 modules are assembled; when n is 9 andm is 7, 19 modules are assembled.

In some embodiments, given that DNA modules are not easy for storage orself-amplification, DNA modules may be inserted into a plasmid in acircular structure for better storage and amplification.

Embodiments of this disclosure also relate to a DNA library includingmultiple DNA segments each corresponding to a repeat domain of a TALE.In some embodiments, each DNA segment may contain a module component andone or more fusion components fused to another DNA segment. Each DNAsegment may also have cutting sites of type II restrictionendonucleases. Therefore, DNA segments may be flanked by a type IIrestriction endonuclease to obtain DNA modules for TALE assemblies. Incertain embodiments, the DNA segments may be PCR amplification productsor recombinant plasmids, such as pMD18-T, TOPO® plasmids, pUC19, andpUC18.

Embodiments of this disclosure also relate to methods for transcriptionactivator-like effector nuclease plasmid assembly. In certainembodiments, the method may include identifying target gene sequences,and designing corresponding TALENs, such as repeat domains of a TALE.Based on repeat domains, multiple DNA segments may be selected from aDNA library. In these instances, in a single cloning reaction reactor,the multiple DNA segments, type II restriction endonucleases, DNAligases, and TALE backbone vector (e.g., plasmids) may be mixed togetherto generate a polynucleotide encoding a TALEN. For example, the multipleDNA segments may be inserted into a backbone plasmid that contains apolynucleotide encoding a DNA restriction enzyme. The polynucleotidesencoding TALENs may be purified by removing incomplete ligation products(e.g., linear DNA segments) using a plasmid-safe Deoxyribonuclease(DNase).

In some embodiments, individual DNA modules may be ligated to other DNAmodules in an order. During ligations of a module to another module or amodule to a plasmid, type II restriction endonucleases may not be ableto cut additional nucleotides. In some embodiments, the multiple DNAsegments (e.g., all DNA segments encoding a TALE), the backboneplasmids, the type II restriction endonucleases, and DNA ligases may beput in a single reactor to generate polynucleotides encoding TALENs,wherein digestion and ligation occur at substantially the same time. Incertain embodiments, the type II restriction endonuclease may be Bsal,and the DNA ligase may be T4 ligase.

For example, a single ligation reactor or assembly reactor may include40-200 ng plasmids, 20-200 DNA segments, 0.5-2 μl type II restrictionendonuclease, 0.5-2 μl DNA ligase, 2 μl DNA ligation buffer, anddouble-distilled water (ddH₂O) to be added to reach a final volume of 20μl. The ligation process may include 15 cycles: 37° C. for 5 min, 16° C.for 10 min, and followed by 80° C. for 10 min.

A polynucleotide sequence of a TALEN plasmid includes a DNA restrictionenzyme, N-terminal and C-terminal may be set forth in any of SEQ ID NO.41 and SEQ ID NO. 42. During the process, the TALEN backbone plasmid maybe cut by type II restriction endonuclease to create a linear DNAsegment with two overhangs. An overhang may be ligated to the 1^(st)overhang of a monomer j or dimer i; and the other overhang may beligated to the 2^(nd) overhang of the monomer j or dimer i.

In some embodiments, incomplete products may be removed usingPlasmid-Safe nucleases. The incomplete linear or linearized DNA segmentsreduce the ligation efficiency by recombination. In some instances,before transformation of generated TALENs, Plasmid-Safe™ ATP-DependentDNase (Epicentre, cat no: E3105K) may be used to digest linear orlinearized DNA segments to increase the ligation efficiency.

In certain embodiments, a designed TALEN may include 20 repeat domains,and thus a polynucleotide encoding the designed TALEN may be generatedusing 20 DNA modules from a DNA library for TALEN assembles. In certainembodiments, using appropriate primers, the DNA library for TALENsassembly may be obtained. The DNA library may contain multiple DNAmodules (e.g., 172 modules). These DNA modules may be monomers eachcorresponding to a TALE recognition module recognizing one nucleotide,and/or dimers each corresponding to two TALE recognition modulesrecognizing two nucleotides. Each of the monomers and dimers containstype II restriction endonuclease cutting sites. By using this DNAlibrary, enzyme digestion and ligation (e.g., 19-module ligation) may beperformed in one reaction reactor or system, therefore avoidingpurifications and additional ligation steps. This increases productionefficiency, and thus improves TALE related techniques. In someembodiments, because DNA modules are plasmids or corresponding PCRproducts, certain risks (e.g., tail end damages and DNA degradations)are avoided. This simplifies TALEN generation procedures, and thereforereduces the cost.

In some embodiments, a polynucleotide encoding TALEN including 20 repeatdomains may be assembled in a single reaction reactor or system. Forexample, an individual TALE repeat modules of these 20 repeat modulesmay identify each of 4 monomers (A, T, C, and G) or each of 16 dimers(AA, AT, AC, AG, TA, TT, TC, TG, CA, CT, CC, CG, GA, GT, GC, and GG).Therefore, RVDs of the TALE repeat module may be NI, NG, HD, and NN ifthe TALE repeat module identifies one nucleotide, or NI-NI, NI-NG,NI-HD, NI-NN, NG-NI, NG-NG, NG-HD, NG-NN, HD-NI, HD-NG, HD-HD, HD-NN,NN-NI, NN-NG, NN-HD, NN-NN if the TALE repeat module identifies twonucleotides. Exemplary sequences of polynucleotides encoding the TALErepeat modules are listed in Table 1.

TABLE 1  Name Sequence SEQ ID NICTGACCCCAGAGCAGGTCGTGGCAATCGCCTCCAACATTGGCGG SEQ ID NO: 1GAAACAGGCACTCGAGACTGTCCAGCGCCTGCTTCCCGTGCTGTG CCAAGCGCACGGA NGCTGACCCCAGAGCAGGTCGTGGCCATTGCCTCGAATGGAGGGGG SEQ ID NO: 2CAAACAGGCGTTGGAAACCGTACAACGATTGCTGCCGGTGCTGT GCCAAGCGCACGGC HDTTGACCCCAGAGCAGGTCGTGGCGATCGCAAGCCACGACGGAGG SEQ ID NO: 3AAAGCAAGCCTTGGAAACAGTACAGAGGCTGTTGCCTGTGCTGT GCCAAGCGCACGGG NNCTTACCCCAGAGCAGGTCGTGGCAATCGCGAGCAATAACGGCGG SEQ ID NO: 4AAAACAGGCTTTGGAAACGGTGCAGAGGCTCCTTCCAGTGCTGT GCCAAGCGCACGGG NI-NICTGACCCCAGAGCAGGTCGTGGCAATCGCCTCCAACATTGGCGG SEQ ID NO: 5GAAACAGGCACTCGAGACTGTCCAGCGCCTGCTTCCCGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCAATCGCCTCCAACATTGGCGGGAAACAGGCACTCGAGACTGTCCAGCGCCTGC TTCCCGTGCTGTGCCAAGCGCACGGTNI-NG CTGACCCCAGAGCAGGTCGTGGCAATCGCCTCCAACATTGGCGG SEQ ID NO: 6GAAACAGGCACTCGAGACTGTCCAGCGCCTGCTTCCCGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCCATTGCCTCGAATGGAGGGGGCAAACAGGCGTTGGAAACCGTACAACGATTG CTGCCGGTGCTGTGCCAAGCGCACGGTNI-HD CTGACCCCAGAGCAGGTCGTGGCAATCGCCTCCAACATTGGCGG SEQ ID NO: 7GAAACAGGCACTCGAGACTGTCCAGCGCCTGCTTCCCGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCGATCGCAAGCCACGACGGAGGAAAGCAAGCCTTGGAAACAGTACAGAGGCT GTTGCCTGTGCTGTGCCAAGCGCACGGTNI-NN CTGACCCCAGAGCAGGTCGTGGCAATCGCCTCCAACATTGGCGG SEQ ID NO: 8GAAACAGGCACTCGAGACTGTCCAGCGCCTGCTTCCCGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCAATCGCGAGCAATAACGGCGGAAAACAGGCTTTGGAAACGGTGCAGAGGCT CCTTCCAGTGCTGTGCCAAGCGCACGGTNG-NI CTGACCCCAGAGCAGGTCGTGGCCATTGCCTCGAATGGAGGGGG SEQ ID NO: 9CAAACAGGCGTTGGAAACCGTACAACGATTGCTGCCGGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCAATCGCCTCCAACATTGGCGGGAAACAGGCACTCGAGACTGTCCAGCGCCTGC TTCCCGTGCTGTGCCAAGCGCACGGTNG-NG CTGACCCCAGAGCAGGTCGTGGCCATTGCCTCGAATGGAGGGGG SEQ ID NO: 10CAAACAGGCGTTGGAAACCGTACAACGATTGCTGCCGGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCCATTGCCTCGAATGGAGGGGGCAAACAGGCGTTGGAAACCGTACAACGATTG CTGCCGGTGCTGTGCCAAGCGCACGGTNG-HD CTGACCCCAGAGCAGGTCGTGGCCATTGCCTCGAATGGAGGGGG SEQ ID NO: 11CAAACAGGCGTTGGAAACCGTACAACGATTGCTGCCGGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCGATCGCAAGCCACGACGGAGGAAAGCAAGCCTTGGAAACAGTACAGAGGCT GTTGCCTGTGCTGTGCCAAGCGCACGGTNG-NN CTGACCCCAGAGCAGGTCGTGGCCATTGCCTCGAATGGAGGGGG SEQ ID NO: 12CAAACAGGCGTTGGAAACCGTACAACGATTGCTGCCGGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCAATCGCGAGCAATAACGGCGGAAAACAGGCTTTGGAAACGGTGCAGAGGCT CCTTCCAGTGCTGTGCCAAGCGCACGGTHD-NI CTGACCCCAGAGCAGGTCGTGGCGATCGCAAGCCACGACGGAG SEQ ID NO: 13GAAAGCAAGCCTTGGAAACAGTACAGAGGCTGTTGCCTGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCAATCGCCTCCAACATTGGCGGGAAACAGGCACTCGAGACTGTCCAGCGCCTG CTTCCCGTGCTGTGCCAAGCGCACGGTHD-NG CTGACCCCAGAGCAGGTCGTGGCGATCGCAAGCCACGACGGAG SEQ ID NO: 14GAAAGCAAGCCTTGGAAACAGTACAGAGGCTGTTGCCTGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCCATTGCCTCGAATGGAGGGGGCAAACAGGCGTTGGAAACCGTACAACGATT GCTGCCGGTGCTGTGCCAAGCGCACGGTHD-HD CTGACCCCAGAGCAGGTCGTGGCGATCGCAAGCCACGACGGAG SEQ ID NO: 15GAAAGCAAGCCTTGGAAACAGTACAGAGGCTGTTGCCTGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCGATCGCAAGCCACGACGGAGGAAAGCAAGCCTTGGAAACAGTACAGAGGCTGTTGCCTGTGCTGTGCCAAGCGCACGGT HD-NNCTCACCCCAGAGCAGGTCGTGGCGATCGCAAGCCACGACGGAGG SEQ ID NO: 16AAAGCAAGCCTTGGAAACAGTACAGAGGCTGTTGCCTGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCAATCGCGAGCAATAACGGCGGAAAACAGGCTTTGGAAACGGTGCAGAGGCT CCTTCCAGTGCTGTGCCAAGCGCACGGANN-NI CTGACCCCAGAGCAGGTCGTGGCAATCGCGAGCAATAACGGCGG SEQ ID NO: 17AAAACAGGCTTTGGAAACGGTGCAGAGGCTCCTTCCAGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCAATCGCCTCCAACATTGGCGGGAAACAGGCACTCGAGACTGTCCAGCGCCTGC TTCCCGTGCTGTGCCAAGCGCACGGTNN-NG CTGACCCCAGAGCAGGTCGTGGCAATCGCGAGCAATAACGGCGG SEQ ID NO: 18AAAACAGGCTTTGGAAACGGTGCAGAGGCTCCTTCCAGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCCATTGCCTCGAATGGAGGGGGCAAACAGGCGTTGGAAACCGTACAACGATTG CTGCCGGTGCTGTGCCAAGCGCACGGTNN-HD CTGACCCCAGAGCAGGTCGTGGCAATCGCGAGCAATAACGGCGG SEQ ID NO: 19AAAACAGGCTTTGGAAACGGTGCAGAGGCTCCTTCCAGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCGATCGCAAGCCACGACGGAGGAAAGCAAGCCTTGGAAACAGTACAGAGGCT GTTGCCTGTGCTGTGCCAAGCGCACGGTNN-NN CTGACCCCAGAGCAGGTCGTGGCAATCGCGAGCAATAACGGCGG SEQ ID NO: 20AAAACAGGCTTTGGAAACGGTGCAGAGGCTCCTTCCAGTGCTTTGTCAGGCACACGGCCTCACTCCGGAACAAGTGGTCGCAATCGCGAGCAATAACGGCGGAAAACAGGCTTTGGAAACGGTGCAGAGGCT CCTTCCAGTGCTGTGCCAAGCGCACGGT

EXAMPLES

A DNA library including 172 DNA segments was established by modifyingthe TALE repeat modules described above. PCR amplification was appliedto add restriction enzyme cutting sites and adaptors. For dimers, PCRwas performed using T-vectors containing 16 dimers and primer pairsincluding F1 and R1, F2 and R2, F3 and R3, F4 and R4, F5 and R5, F6 andR6, F7 and R7, F8 and R8, as well as F9 and R9. There were 144 (i.e.,16×9) PCR products. For monomers, PCR was performed using T-vectorscontaining 4 monomers and primer pairs including F1 and R1, F2 and R2,F3 and R3, F4 and R4, F5 and R5, F6 and R6, as well as F7 and R7. Therewere 28 (i.e., 4×7) PCR products. Thus, the DNA library includes 172 PCTproducts (i.e., 144 plus 28). Exemplary sequences of primer pairs F1 andR1, F2 and R2, F3 and R3, F4 and R4, F5 and R5, F6 and R6, F7 and R7, F8and R8, as well as F9 and R9 may be listed in Table 2, and lower caseletter indicates Bsal cutting sites.

TABLE 2  Name Sequence SEQ ID TALE-F1 AATGGACGACCCGGCTTGATAggtctcC

CCCAGAGCAG SEQ ID NO: 21 GTCGTGG TALE-R1 CATCACAGGTAGCTCGCTGGAggtctcT

CCGTGCGCTTG SEQ ID NO: 22 GCAC TALE-F2 ATCGATCGATCGCGATCGATCggtctcG

ACCCCAGAGCA SEQ ID NO: 23 GGTCGTG TALE-R2 GCAGCCACGGCTAGCTTAAGCggtctcT

CCGTGCGCTTG SEQ ID NO: 24 GCAC TALE-F3 ATCGATCGATCGCGATCGATCggtctcG

ACCCCAGAGCA SEQ ID NO: 25 GGTCGTG TALE-R3 GAACCGCCGTCTTACGTAGAGggtctcT

CCGTGCGCTTG SEQ ID NO: 26 GCAC TALE-F4 TTTAGCCCGTACCGTAGCCTAggtctcG

ACCCCAGAGCA SEQ ID NO: 27 GGTCGTG TALE-R4 TTGCACCGGTATCGTCGAGGCggtctcT

CCGTGCGCTTG SEQ ID NO: 28 GCAC TALE-F5 AAGCATGGATCGCAAGGGTTGggtctcG

ACCCCAGAGC SEQ ID NO: 29 AGGTCGTG TALE-R5 GGGTTGCGCTCGCAATTACCGggtctcT

CCGTGCGCTTG SEQ ID NO: 30 GCAC TALE-F6 CGAAATCCGACCGGATGCCTAggtctcG

ACCCCAGAGCA SEQ ID NO: 31 GGTCGTG TALE-R6 GCCATCGCGTCGCACGAAGCTggtctcT

CCGTGCGCTTG SEQ ID NO: 32 GCAC TALE-F7 ATAGCTGGTAGGGCTACGGGCggtctcG

ACCCCAGAGC SEQ ID NO: 33 AGGTCGTG TALE-R7 GAACGACCCCTGACAGTCGTTggtctcT

CCGTGCGCTTG SEQ ID NO: 34 GCAC TALE-F8 CGATATCGATCGCCTTACGCggtctcG

ACCCCAGAGCAG SEQ ID NO: 35 GTCGTG TALE-R8 CGCCACATATATAGCGCGTCCggtctcT

CCGTGCGCTIGG SEQ ID NO: 36 CAC TALE-F9 GTGTGACGGCTAGCCTAGTAggtctcG

ACCCCAGAGCA SEQ ID NO: 37 GGTCGTG TALE-R9 GCTTGCGGATCGATAGCATGGggtctcT

CCGTGCGCTTG SEQ ID NO: 38 GCAC

Regarding the PCR, approximately 1 μl Plasmid was mixed with a solutioncontaining 0.2 μl Primers (0.1 μl for each of the primer pair), 1.5 μlBuffer, 0.8 μl dNTP, 0.35 μl MgSO4, 11.48 μl ddH2O, and 1 Unit DNAPolymerase. The following PCR reaction was used: 36 cycles 95° C. for 2min, 95° C. for 15 sec, 55.8° C. for 30 sec, 68° C. for 30 sec, 68° C.for 2 sec, and followed by 68° C. for 1 min.

All 18 primers contain a Bsal cutting site: GGTCTCN′NNNN (SEQ ID NO:49), wherein N represents a nucleotide. Bsal belongs to type IIrestriction endonuclease, and one cutting site can generate variousoverhangs. Using type II restriction endonuclease, 24 fusion sites weregenerated with respect to 4 codons for Gly and 6 codons for Leu. Inaddition, 10 of those 24 were selected for primer designs. Except for F1and R9, Fk can specifically ligate to Rk-1, but not other primers,wherein k is an integer between 3 and 9.

The 172 PCR products were purified by gel extraction, ligated andinserted into pMD18-T plasmids. The following ligation of 20 originalmodules into pMD18-T (from Takara) was used. First, 2.7 μl PCR productswas mixed with a solution containing 3 μl solution 1 and 0.3 μl pMD18-T.Then, the mixture was incubated at 16° C. for 2 hours, transformed intoDH5a, and stroke onto LB plates containing kanamycin. Colonies wereselected, and plasmids were isolated. The PCR products were verified byPCR and sequencing. Finally, a plasmid library containing 172 plasmidswere established, as illustrated in FIG. 1.

A PCR product library was generated using assem-F and assem-R as primers(e.g., sequences in Table 3) and plasmids of the 172 plasmid library asPCR templates. The binding sites of primers are 400 by upstream anddownstream of polynucleotides encoding individual TALE repeat modules.In addition, the PCR products for dimers are about 1050 by and formonomers are about 950 bp.

TABLE 3  Name Sequence SEQ ID assem-F TGTTGTGTGGAATTGTGAGCGGATAACSEQ ID NO: 39 assem-R TGCAAGGCGATTAAGTTGGGTAACG SEQ ID NO: 40

For PCR amplification (50 μl), 0.5 μl DNA template (about 50 ng) wasmixed with a solution containing 0.3 μl (50 μM) for each primer, 0.25 μlpfx polymerase (Invitrogen), 5 μl 10× buffer, 2.5 μl dNTP (2.5 μM), 1 ulMgSO4, 40.15 μl ddH2O. The following PCR amplification program was used:36 cycles 95° C. for 2 min, 95° C. for 15 sec, 68° C. for 30 sec, 68° C.for 50 sec, and followed by 68° C. for 5 min.

The PCR products were purified using DNA purification kits (Taingen),and measured concentrations by agar gel electrophoresis. Enzymedigestion sites of two TALEN plasmids: pEF1a-NLS-TALEbackbone-Fok1(R)-pA and pEF1a-NLS-TALE backbone-Fok1(L)-IRES-PURO-pA,were illustrated in FIGS. 2 and 3 respectively. The sequences are shownas SEQ ID NO: 41 and SEQ ID NO: 42. Sequences of N-terminal andC-terminal of transcription activator-like effectors are shown as SEQ IDNO: 43 and SEQ ID NO: 44. Before ligation, Bsal was added to digestTALEN vectors to obtain overhangs for repeats modules. Digested TALENvectors were purified by gel extraction, and concentrations weredetermined by gel electrophoresis.

With respect to TALEN ligation, except? for F1 and R9 (F1 ligates toleft end of TALEN vector, R9 ligates to right end of backbone vector),Fk can ligate to Rk-1 at overhangs, but not to others. After ligations,Bsal is not able to break modules and backbone vectors.

FIG. 4 illustrates a process for assembling a TALEN containing 19 repeatsegments. As illustrated, the last half DNA segment encoding a modulerecognizing T is in the backbone vector already; thus the ligation of 18modules is enough. 9 DNA segments may be selected based on targetsequences, and mixed with a solution containing TALEN backbone vector,Bsal, and T4 ligase to digest and ligate in the same reactor or system.

The following assembly system was used: 150 ng vector, 50 ng each DNAsegment, 1 μl Bsal (NEB), 1 μl T4 ligase (Fermentas), 2 μl T4 Buffer(NEB), and double-distilled water (ddH₂O) to make to final 20 μl. Thefollowing ligation program was used: 15 cycles 37° C. for 5 min, 16° C.for 10 min, and followed by 80° C. for 10 min.

If occasional incomplete ligation happens (e.g., only 1 to 8 modules areligated), this incomplete ligation may slow down the ligation efficiencyby recombination. Thus, before transformation, a Plasmid-Safe™ATP-Dependent DNase (Epicentre, cat no: E3105K) may be used to digestthe linear plasmids. To remove the linear plasmids, 1 μl plasmid-safeDNases and 0.5 μl ATP were added into a 20 μl ligation system for anadditional incubation at 37° C. for 1 hour. 10 μl of ligation productswere taken to transform Trans-T1 competent cells. Colonies were selectedto obtain isolated vectors. Restriction analysis was performed by usingBamH1/Pst1. The expected size of smaller fragment is the length ofligated size plus 550 bp. The final precuts were sent for sequencing.Exemplary sequencing primers are listed in table 4.

TABLE 4  Name Sequence SEQ ID Sequence-F CTCCCCTCAGCTGGACACSEQ ID NO: 45 Sequence-R AGCTGGGCCACGATTGAC SEQ ID NO: 46

Embodiments of this disclosure allow obtaining sequence-confirmed TALENvectors within 3 days. For example, the ligation (4.5 hours),plasmid-safe DNase digestion (1 hour), and transformation (1 hour) maybe performed in the first day. Colonies selection and bacterialinculcation may be performed in day 2. Finally, the sequence analysisresults may be received in day 3. If the target sequence is 12-18 butnot 19, the modules located in the front part can be changed from dimersinto monomers, and thus the change of dimer to monomer can reduce amodule. Exemplary options for different monomers or dimers specific tothe targeting nucleotide(s) are shown in picture 6.

In some embodiments, polynucleotides encoding TALENs for targetingcertain sequences may be assembled in a single reaction. Examples of thesequences may be found in table 5.

TABLE 5  Name Sequence SEQ ID Sequence 1 CGCGCGCGCGCGCGCGCGTSEQ ID NO: 47 Sequence 2 CCCACTCCCCATCCAGT SEQ ID NO: 48

In these instances, DNA segments encoding repeat modules were selectedfrom the PCR library. For example, for sequence 1, DNA segmentscorresponding to CG-1, CG-2, CG-3, CG-4, CG-5, CG-6, CG-7, CG-8, andCG-9 were chosen, and TALEN vectors containing pEF1a-NLS-TALEbackbone-Fok1(R)-pA were used. For sequence 2, DNA segmentscorresponding to C-1, A-2, C-3, TC-4, CC-5, CA-6, TC-7, CA-8, and GT-9were chosen, and TALEN vectors containing pEF1a-NLS-TALE backbone-Fok1(L)-IRES-PURO-pA were used. The following assembly system was used: 150ng vector, 50 ng each modules, 1 μl Bsal (NEB), 1 μl T4 Ligase(fermentas), 2 μl T4 Buffer (NEB), and H₂O to make the system solutionto final 20 μl. The following Ligation program was used for 15 cycles:37° C. for 5 min, and 16° C. for 10 min, and followed by 80° C. for 10min.

The ligation products were purified using plasmid-safe DNases for 1hour. The products (plasmids) were then transformed into Trans-T1chemically competent cells. The plasmids were isolated and analyzed byBamH1\EcoR1 restriction digestion and gel electrophoresis. FIG. 7 is aphotograph of an agarose gel electrophoresis showing confirmation ofassembly clones by restriction digestion analysis. As illustrated,enzyme digestion bands include: 1 kb DNA marker in the middle lane,ligation I indicating 3.1 kb and 2.2 kb on the right of the DNA marker,and ligation II indicating 4.2 kb and 3.7 kb on the left of the DNAmarker. Cloning efficiency for assembling TALENs containing recognitiondomains to identify Sequence 1 and Sequence 2 in a single reaction is70% and 80% respectively.

What is claimed is:
 1. A method for assembling a polynucleotide encodinga transcription activator-like effector (TALE) polypeptide, the methodcomprising: generating multiple Deoxyribonucleic acid (DNA) segments, anindividual DNA segment of the multiple DNA segments corresponding to arepeat sequence of the TALE polypeptide, the repeat sequence identifyinga single nucleotide of a particular target polynucleotide or twocontiguous nucleotides of the particular target polynucleotide; mixingthe multiple DNA segments with restriction enzymes, DNA ligases, and aTALE backbone vector to generate the polynucleotide encoding the TALEpolypeptide, a number of multiple DNA segments being greater than 14;and purifying the polynucleotide using a plasmid-safe Deoxyribonuclease(DNase).
 2. The method of claim 1, wherein the multiple DNA segments arepolymerase chain reaction (PCR) amplification products or a plasmid. 3.The method of claim 1, wherein the number of multiple DNA segments isgreater than
 18. 4. The method of claim 1, wherein the mixing themultiple DNA segments with the restriction enzymes, the DNA ligases, andthe TALE backbone vector comprises mixing the multiple DNA segments withthe restriction enzymes, the DNA ligases, and the TALE backbone vectorin a reaction system.
 5. The method of claim 1, wherein the mixing themultiple DNA segments with the restriction enzymes, the DNA ligases, andthe TALE backbone vector comprises mixing the multiple DNA segments withthe restriction enzymes, the DNA ligases, and the TALE backbone vectorsunder a condition including multiple temperature cyclings.
 6. The methodof claim 5, wherein the multiple temperature cyclings is followed byinactivation of the restriction enzymes.
 7. The method of claim 1,wherein the mixing the multiple DNA segments with the restrictionenzymes, the DNA ligases, and the TALE backbone vector comprises mixingthe multiple DNA segments with the restriction enzymes that belongs to atype of type II restriction enzymes, the DNA ligases, and the TALEbackbone vectors without other types of the type II restriction enzymes.8. The method of claim 1, wherein the individual DNA segment of themultiple DNA segments has two cleavage sites that are generated usingtype II restriction enzymes, and wherein the restriction enzymes are thetype II restriction enzymes.
 9. A kit for generating polynucleotidesencoding a transcription activator-like effector (TALE) polypeptide, thekit comprising: multiple sets of dimer DNA segments, an individual setof dimer DNA segments including multiple dimer DNA segments, anindividual dimer DNA segment encoding two recognition domains of adesigned TALE that identify two contiguous nucleotides of a particulartarget polynucleotide; multiple sets of monomer DNA segments, anindividual set of monomer DNA segments including multiple monomer DNAmodules, an individual monomer DNA module encoding a recognition domainof the designed TALE that identifies a single nucleotide of theparticular target polynucleotide, wherein the individual dimer DNAsegments and the individual monomer DNA segments including a firstoverhang and a second overhang that are generated using a type IIrestriction endonuclease; restriction enzymes; DNA ligases; and TALEbackbone vectors.
 10. The Kit of claim 9, wherein a sequence of a firstoverhang of a dimer DNA segment of multiple dimer DNA segments iscomplementary to a sequence of a second overhang of a particular dimerNDA segment that is ligated to the dimer DNA segment.
 11. The Kit ofclaim 9, wherein a sequence of a first overhang of a monomer DNA segmentof multiple monomer DNA segments is complementary to a sequence of asecond overhang of a particular monomer DNA segment that is ligated tothe monomer DNA segment.
 12. The Kit of claim 9, wherein the type IIrestriction endonuclease is Bsa I.
 13. The Kit of claim 9, wherein anumber of the multiple sets of dimer DNA segments is from 8 to 10, and anumber of the multiple sets of monomer DNA segments is from 6 to
 8. 14.The Kit of claim 9, wherein an individual set of dimer DNA segmentsincludes 15 to 17 dimer DNA segments, and an individual set of monomerDNA segments include 3 to 5 monomer DNA segments.
 15. The Kit of claim9, wherein the multiple dimer DNA segments and the multiple monomer DNAsegments include an overlap extension polymerase chain reaction (PCR)product or a plasmid.
 16. The Kit of claim 9, wherein the multiple dimerDNA segments and the multiple monomer DNA segments include pMD18-T.